Optimal unbiased estimation for expected cumulative discounted cost

نویسندگان

چکیده

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Optimal Expected Discounted Reward of a Wireless Network with Award and Cost

In this paper, we extend our previous optimization investigation on a single cell (IEEE Transactions on Wireless Communications, vol. 8, no. 2, pp. 1038-1044, 2009) to a whole network with multiple cells and further consider the call admission control (CAC) based on the total expected discounted reward. Here, the system will get the award for admitting a call, but will incur a cost for rejectin...

متن کامل

Total Expected Discounted Reward MDPs: Existence of Optimal Policies

This article describes the results on the existence of optimal and nearly optimal policies for Markov Decision Processes (MDPs) with total expected discounted rewards. The problem of optimization of total expected discounted rewards for MDPs is also known under the name of discounted dynamic programming.

متن کامل

Cumulative Risk Estimation for Chemical Mixtures

In reality, humans are always exposed to a combination of toxic substances and seldom to a single agent. Simultaneous exposure to a multitude of chemicals could result in unexpected consequences. The combined risk may lead to greater or less than a simple summation of the effects induced by chemicals given individually. Here, a method is proposed for estimating the cumulative risk which is the ...

متن کامل

Optimal controller/observer gains of discounted-cost LQG systems

The linear-quadratic-Gaussian control paradigm is well-known in literature. The strategy of minimizing the cost function is available, both for the case where the state is fully known and where it is estimated through an observer. The situation is different when the cost function has an exponential discount factor, also known as a prescribed degree of stability. In this case, the optimal contro...

متن کامل

Near-optimal PAC bounds for discounted MDPs

We study upper and lower bounds on the sample-complexity of learning near-optimal behaviour in finite-state discounted Markov Decision Processes (MDPs). We prove a new bound for a modified version of Upper Confidence Reinforcement Learning (UCRL) with only cubic dependence on the horizon. The bound is unimprovable in all parameters except the size of the state/action space, where it depends lin...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: European Journal of Operational Research

سال: 2020

ISSN: 0377-2217

DOI: 10.1016/j.ejor.2020.03.072